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Abstract 

We study confidentiality enforcement in ontologies 
under the Controlled Query Evaluation framework, 
where a policy specifies the sensitive information 
and a censor ensures that query answers that may 
compromise the policy are not returned. We focus 
on censors that ensure confidentiality while max¬ 
imising information access, and consider both Dat¬ 
alog and the OWL 2 prohles as ontology languages. 

1 Introduction 

As semantic technologies are becoming increasingly mature, 
there is a need for mechanisms to ensure that confidential data 
is only accessible by authorised users. 

Controlled Query Evaluation (CQE) is a prominent con¬ 
fidentiality enforcement framework, in which sensitive in¬ 
formation is declaratively specihed by means of a policy 
and confidentiality is enforced by a censor. When given a 
query, a censor checks whether returning the answer may 
lead to a policy violation, in which case it returns a dis¬ 
torted answer. The CQE framework was introduced in 


We study CQE for ontologies that are expressed in the 
rule language Datalog as well as in the lightweight descrip¬ 
tion logics (DLs) underpinning the standadised profiles of 
OWL 2 I lMotik et ai, 2012| . We assume that data is hid¬ 
den, and users access the system by a query interface. An on¬ 
tology, which is known to users, provides the vocabulary and 
background knowledge needed for users to formulate queries, 
as well as to enrich query answers with implicit information. 
Policies, formalised as conjunctive queries, are available only 
to system administrators, but not to ordinary users. The role 
of the censor is to preserve confidentiality by filtering out 
those answers to user queries that could lead to a policy vio¬ 
lation. In this setting, there is a danger that confidentiality en¬ 
forcement may over-restrict the access of the user. Thus, we 
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focus on optimal censors, which maximise answers to queries 
while ensuring confidentiality of the policy. 

We are especially interested in censors that can be realised 
by off-the-shelf reasoning infrastructure. To fullfil this re¬ 
quirement, we introduce in Section view and obstruction 
censors. View censors return only answers that follow from 
the ontology and an anonymised dataset (a view) where some 
occurrences of constants may have been replaced with la¬ 
belled nulls. The censor answers faithfully all queries against 
the view; thus, any information not captured by the view is 
inaccessible by default. View censors may require material¬ 
isation of implicit data, and hence are well-suited for appli¬ 
cations where materialisation is feasible. Obstruction censors 
are defined by a set of “forbidden query patterns” (an obstruc¬ 
tion), where all answers instantiating such patterns are not 
returned to users. These censors do not require data modifi¬ 
cation and are well-suited for applications such as Ontology 
Based Data Access (OBDA), where data is managed by an 
RDBMS. Obstruction censors are dual to view censors in the 
sense that they specify the information that users are denied 
access to. We formally characterise this duality, and show 
that their capabilities are incomparable. 

In Sectionj^we investigate the limitations of view censors 
and show that checking existence of an optimal view is un- 
decidable for Datalog ontologies. We then study fragments 
of Datalog for which optimal views always exist and extend 
our results to OWL 2 profile ontologies. In Section]^ we fo¬ 
cus on obstruction censors, and provide sufficient and neces¬ 
sary conditions for an optimal censor to exist. Then, we pro¬ 
pose a tractable algorithm for computing optimal obstruction 
censors for linear Datalog ontologies and apply our results to 
OWL 2 QL ontologies. 


2 Preliminaries 

We adopt standard notions in first order logic over function- 
free finite signatures. Our focus is on ontologies, so we as¬ 
sume signatures with predicates of arity at most two. We treat 
equality « as an ordinary predicate, but assume that any set 
of formulae containing « also contains all the axioms of « 
for its signature. 

Datasets and Ontologies A dataset is a finite set of facts 
(i.e., ground atoms). An ontology is a finite set of rules, that 


























(1) A{x)AR{x,yi)AB{yi)AR{x,y 2 )AB{y 2 ) -)■ j/i « y 2 , 

(2) R{x, y) -A S{x, y), (3) A{x) 3y.[R{x, y)AB{y)], 

(4) A{x) -A xKia, (5) R{x, y)AS{y, z) -A T{x, z), 

(6) A{x)aB{x) C{x), (7) A{x) A R{x,y) -)■ B{y), 

(8) R{x, y) -A S{y, x), (9) R{x, a) B(x), 

(10) R(x, y) ->■ A{y), (11) A{x) R{x, a), 

(12) ^(a;) B{x), (13) R{x, y) A B{y) ->■ A{x). 

Table 1: OWL 2 profile axioms as rules 


is, formulae of the form 

ip{x) -A- 3y.tjj{x,y), 

where the body ip{x) and the head 'tp{x,y) are conjunctions 
of atoms, and variables x are implicitly universally quantified. 
We restrict ourselves to ontologies O and datasets V such that 
OUT> is satisfiable, which ensures that answers to queries are 
meaningful. A rule is 

- Datalog if the head has a single atom and y is empty; 

- guarded if the body has an atom (guard) with all x', 

- linear if the body has a single atom; 

- multi-linear if the body contains only guards; 

- tree-shaped if the undirected multigraph with an edge 
{^ 1 ,^ 2 } for each binary body atom R{ti,t 2 ) is a tree. 

An ontology is of a type above if so are all the rules in it. 

OWL 2 Profiles Table[T]provides the types of rules sufficient 
to capture the axioms in the OWL 2 RL, EL, and QL profiles. 
We treat the T concept in DLs as a unary predicate and as¬ 
sume that each ontology contains the rule S(x) —)■ T (x) for 
each predicate S and variable x from x. An ontology consist¬ 
ing of rules in Table [^is 

- RL if it has no rules of type (3); 

- QL if it only has rules of types (2), (3), (8), (10), (12); 

- EL if it has no rules of types (1), (7), (8). 

Queries A conjunctive query (CQ) with free variables a: is a 
formula Q(x) of the form 3y.(p(x, y), with the body ip(x, y) 
a conjunction of atoms. A union of CQs (UCQ) is disjunction 
of CQs with same free variables. Queries with no free vari¬ 
ables are Boolean. A tuple of constants a is a (certain) answer 
to Q(x) over ontology O and dataset I? if O U I? |= Q{a). 
The set of answers to Q(x) over O and V is denoted by 
cert(Q, O, V). 


3 Basic Framework 


We assume that data V is hidden while the ontology O is 
known to all users. It is assumed that system administrators 
are in charge of specifying policies as CQs, and that policies 
are assigned to users by standard mechan isms such as role- 
based access control I Sandhu et al, 199^ . 


Definition 1. A CQE instance I is a triple (O, I?, P), with O 
an ontology, P a dataset, and P a CQ, which is called policy. 
The instance I is Datalog, guarded, etc. if so is the ontology 
O U {p(x, y) -A Ap(x)}, where (p{x, y) is the body of P and 
Ap afresh predicate. 


Example 2. Consider the following ontology and dataset that 
describe an excerpt of a social network: 

Oex = {Likes(x,y) A Thr(y) —>■ ThrFan(x), 
Susp(x)ACr{x) — > Thr(x), FoF(x,y) -A- FoF{y,x)}, 

I?ex = {^ 0 -F'(John,Bob), Ebf (Bob,Mary), C'r(Seven), 
Lffces(John,Seven), Lzfces(Bob,Seven), 6'Msp(Seven)}. 

Here, the ontology Ogx states, for example, that people who 
like thrillers are thriller fans, or that friendship is a symmetric 
relation. Then, a policy Pgx = Ebf (John, x) forbids access 
to John’s friend list. 0 

A key component of a CQE system is the censor, whose 
goal is to decide according to the policy which query answers 
can be safely returned to users. 

Definition 3. A censor for a CQE instance (0,D,P) 
is a function cens mapping each CQ Q to a subset of 
cert((5, 0, D). The theory Thggns o/cens is the set 
{Q(d) I a S cens((3) and Q{x) is a CQ}. 

Censor cens is confidentiality preserving if for each tuple a 
of constants O U Thggps ^ P{a)- It is optimal if 

- it is confidentiality preserving, and 

- no confidentiality preserving censor cens' cens exists 
such that cens(Q) C cens'(Q) for every CQ Q. 

Intuitively, Thggps represents all the information that a user 

can gather by asking CQs to the system. If the censor is con¬ 
fidentiality preserving, then no information can be obtained 
about the policy, regardless of the number of CQs asked. In 
this way, optimal censors maximise information accessibility 
without compromising the policy. 

4 View and Obstruction Censors 

The idea behind view censors is to modify the dataset by 
anonymising occurrences of constants as well as by adding or 
removing facts, whenever needed. We refer to such modified 
dataset as an (anonymisation) view. The censor returns only 
the answers that follow from the ontology and view; in this 
way, the main workload of the censor amounts to the compu¬ 
tation of certain answers, which can be delegated to the query 
answering engine. 

Definition 4. A view V for I = {0,D,P) is a dataset over 
the signature ofl extended with a set of fresh constants. The 
view censor vcens^ is the function mapping each CQ Q{x) 
to the set cert(Q, O, D) C cert(Q, O, )2). The view is optimal 
if so is its corresponding censor. 

Clearly, for the censor to be confidentiality preserving O U 
V must not entail any answer to the policy. On the other hand, 
to ensure optimality a view must encode as much information 
from the hidden dataset as possible. 

Example 5. Consider the view )2gx obtained from by re¬ 
placing Bob with a fresh aub. Intuitively, Vgx is the result of 
“anonymising” the constant Bob, while keeping the structure 
of the data intact. Since )2ex contains no information about 
Bob, we have cert(Pgx, Oex, Vex) = 0 and the censor based 
on Vex is confidentiality preserving. View Vex, however, is 
not optimal: for instance, O^x U Vex does not entail the fact 
Likes{Boh, Seven), which can be added to the view without 
violating confidentiality. O’ 





The idea behind obstruction censors is to associate to a 
CQE instance a Boolean UCQ U s.t. the censor returns an 
answer a to a CQ Q{x) only if no CQ in U follows from 
(5(a). Thus, the obstruction can be seen as a set of forbidden 
query patterns, which should not be disclosed. 

Definition 6. An obstruction U for I = {0,1), P) is a 
Boolean UCQ. The obstruction censor ocens^ based on U 
is the function that maps each CQ Q{x) to the set 

{a I a S cert((5, O, V) and Q{d) ^ C/}. 

The obstruction is optimal if so is its censor ocens^. 

Similarly to view censors, obstruction censors do not re¬ 
quire dedicated algorithms; checking Q{d) ^ U can be del¬ 
egated to an RDBMS. Obstructions can be virtually main¬ 
tained and do not require data materialisation. 

Example 7. The censor based on Vex from Example]^ can 
also be realised with the following obstruction U^y,: 

3x.FoF{x, Bob) V 3x.FoF{Bob, x) V 

3x. Likes {Bob, x)\/ ThrFan{Bob). 

Intuitively, C/gx “blocks” query answers involving Bob; and 
all other answers are the same as over O^x U Pex- 0 

Examples|^and[7]show that the same censor may be based 
on both a view and an obstruction. These censors, however, 
behave dually: a view explicitly encodes the information ac¬ 
cessible to users, whereas obstructions specify information 
which users are denied access to. It is not obvious whether 
(and how) a view can be realised by an obstruction, or vice- 
versa. We next focus on Catalog ontologies and characterise 
when a view V and obstruction U yield the same censor. We 
start with few definitions. 

Each Datalog ontology O and dataset T) have a unique 
least Herbrand model FLo.ti- ^ finite structure satisfying 
a G cert((5, O, V) iff T-La.-D h Q(o) for every CQ Q. Thus, 
this model captures all the information relevant to CQ an¬ 
swering. A natural specification of the duality between views 
and obstructions is then as follows: U and V implement the 
same censor if and only if U captures the structures not homo- 
morphically embeddable into FLo.v- To formalise this state¬ 
ment, we recall the central problem in the (non-uniform) con¬ 
straint satisfaction theory. 

Definition 8 (Kolaitis and Vardi, 2008). Let C be a class of 
finite structures and let C' be a subset ofC. First-order sen¬ 
tence tp defines C' ifT G C' is equivalent to I \= fit for every 
structure I G C. 

Let J ^ J' denote the fact that there is a homomorphism 
from a structure to a structure J' . The correspondence is 
given in the following theorem. 

Theorem 9. Let I = {0,1), P) be Datalog and C = {I | 
I finite,! ^ ’Ho’dI- Then, vcensy = ocensF iffU defines 
the setC\{XGC \ !^Ho,v}- 

Using this theorem together with definability results in Ei- 
nite Model Theory, we can show that views and obstructions 
cannot simulate one another in general. 


Theorem 10. There is a Datalog CQE instance admitting 
a confidentiality preserving view censor that is not based on 
any obstruction. Conversely, there is a Datalog CQE instance 
admitting a confidentiality preserving obstruction censor that 
is not based on any view. 

5 Optimal View Censors 

Our discussion in Section [^suggests that view and obstruc¬ 
tion censors must be studied independently. In this section we 
focus on view censors and start by establishing their theoret¬ 
ical limitations. The following example shows that optimal 
view censors may not exist, even if we restrict ourselves to 
empty ontologies. 

Example 11. Consider a CQE instance with empty ontol¬ 
ogy, dataset consisting of a fact R{a, a), and policy P = 
3x3y3z.P{x,y) A R{y,z) A R{z,x). Consider also the 
family of Boolean CQs Qn = 3xi... Ai<j R{xi, Xj), 
which intuitively represent strict total orders on n ele¬ 
ments. Answering these queries positively is harmless: V U 
{Qn}n>i ^ P for any confidentiality preserving view V. 
Assume now that V is optimal, and let m be the number of 
constants in V. Then, V ^ Qm-vi since otherwise V would 
encode a self-loop and violate the policy. This contradicts the 
optimality of V, and hence no optimal view exists. O’ 

Eurthermore, determining the existence of an optimal view 
is undecidable even for Datalog CQE instances. 

Theorem 12. The problem of checking whether a Datalog 
CQE instance admits an optimal view is undecidable. 

Proof (idea). The proof is by reduction to the undecidable 
problem of checking whether a deterministic Turing machine 
without a final state has a repeated configuration in a run on 
the empty tape. Eor each such machine we construct a CQE 
instance such that the run corresponds to an infinite grid-like 
“view” with axes for the tape and time. The ontology guar¬ 
antees that representations of adjacent configurations agree 
with the transition function, and the policy forbids invalid 
configurations (e.g., with many symbols in a cell). Coin¬ 
ciding configurations appear in the run iff the grid can be 
“folded” to a finite view on all sides (e.g., if configurations 
can be merged). □ 

In what follows, we identify classes of CQE instances that 
guarantee existence of optimal view censors. We start by 
studying restrictions on Datalog ontologies and then adapt the 
obtained results to the OWL 2 profiles. 

5.1 Guarded Tree-Shaped Datalog 

The idea behind view censors is to anonymise information in 
the original data in such a way that the policy cannot be vi¬ 
olated. Eor instance, in Example we substituted the atom 
EbF(John, Bob) with FoF(John, ant,), where ant, is a fresh 
constant that is an anonymised copy of Bob. In general, how¬ 
ever, many such anonymous copies may be required for each 
data constant to encode all the information required for ensur¬ 
ing optimality. The limit case is illustrated by Example [TT] 
where no finite number of fresh constants suffices for opti¬ 
mality. 


f John 1 

L 1 

^■^^^^{MovFan} ^ 

1 

Joh n { MovFan , ThrFan } ] 



■■ 1 



1 

[ Bob] 

^^^^{MovFan, ThrFan} j 


Figure 1: Part of optimal view in Example [T3l (omitted labels 
coincide to subscipts, arrows represent FoF) 

Observe that the CQE instance used in Example TT]is nei¬ 
ther guarded nor tree-shaped due to the form of the policy. 
In what follows, we show that an optimal view can always 
be constructed using at most exponentially many anonymous 
constants if we restrict ourselves to Catalog CQE instances 
that are guarded and tree-shaped. 

We next provide an intuitive idea of the construction. Con¬ 
sider the view for a CQE instance {O, V, P) consisting of the 
following three components V 1 -V 3 . 

(1) Component Vi is any maximal set of unary atoms in 
7io,v that does not compromise the policy. 

(2) To construct V 2 , we consider an anonymised copy og of 
each constant a and each set B of unary predicates B s.t. 
'Ho,v H B{a). The corresponding set of all unary atoms 
B^as) for B G S is a part of V 2 if and only if it is “safe”, 
that is, neither discloses the policy nor entail new facts 
together with OUVi. 

(3) Einally, V 3 consists of a maximal set of binary atoms on 
all the constants (including the copies) that are justified 
by Ho.v and do not disclose the policy. 

Optimality of this view follows immediately from the con¬ 
struction. The view, however, may require exponentially 
many anonymised copies of data constants. The need for 
them is illustrated by the following example. 

Example 13. Consider the CQE instance with ontol¬ 
ogy consisting of rules ThrFan{x)^MovFan{x) and 
ThrFan{y)/\FoF{x,y) —>■ MovFan(x), dataset con¬ 
sisting of facts fof (John, Bob), ThrFan(John) and 
ThrFan(Bob), and policy MovFan{x). The essential part 
of the optimal view obtained using the aforementioned con¬ 
struction is given in Eigure Here Vi = 0, V 2 contains 
unary atoms over the anonymised copies 3ohr\^MovFan} and 

3ohr\{MovFan,ThrFan} of John, and Boh^MovFan,ThrFan} of 

Bob, while V 3 contains the FoF atoms represented by ar¬ 
rows. Note that at least two anonymised copies of John are 
necessary in any optimal view to answer correctly “harmless” 
queries such as 


Ifl is multi-linear, it admits an optimal view that can be com¬ 
puted in time polynomial in |I|. Additionally, I has a unique 
optimal censor if it is linear. 


5.2 OWL 2 Profiles 


The result in Theorem 14 is immediately applicable to RE 
ontologies, with the only restriction that they do not contain 
rules of types (1), (4), or (5) in Table In contrast to RE, 
the QL and EL profiles provide means for capturing exis¬ 
tentially quantified knowledge. To bridge this gap, we show 
that every (guarded) QL or EL CQE instance I = (O, V, P) 
can be polynomially trasformed into a Catalog CQE instance 
I' = [O','D,P) by rewriting O into a (guarded and tree¬ 
shaped) Catalog ontology O' in such a way that optimal 
views for I can be directly obtained from those for I'. We 
start by specifying what constitutes an acceptable rewriting 
O' ofO. 


Definition 15. Let a be a set of constants.A Datalog ontol¬ 
ogy O' is a cr-rewriting of an ontology O ifcert{Q, O, V) = 
c&rt{Q,0',7)) for each tree-shaped CQ Q and dataset P 
over constants from a. 


The following proposition provides the mechanism to re¬ 
duce optimal view computation for arbitrary ontologies to the 
case of Catalog. 

Proposition 16. Let I = (0,72, P) be a CQE instance over 
constants a with P tree-shaped, and O' a a-rewriting of O 
s.t. O' \= O. IfV' is an optimal view for V = {O' ,P, P), 
then T-Lo' ,V' B an optimal view for I. 


With this proposition at hand, we just need to devise a tech¬ 
nique for rewriting any QL (or guarded EL) ontology into 
a stronger Catalog ontology, which, however, preserves the 
answers to all tree-shaped queries. To this end, we exploit 
techniques devel oped for the so-called combined approach to 
query answering jKontchakov et al, 2011 Lutz et al., 2009 


|Lutz et al, 2013|[Stefanoni et al, 2013| . The idea is to trans¬ 
form rules of type (3) into Catalog by Skolemising existen¬ 
tially quantified variables into globally fresh constants. Such 
transformation strengthens the ontology; however, if applied 
to a QL or guarded EL ontology, it preserves answers to tr ee- 
shaped CQs for any dataset over tr IStefanoni et al., 2013). 


Definition 17. Let O be an ontology and a be a set of con¬ 
stants. The ontology So-(O) is obtained from O by replacing 
each rule A{x) —>■ 3y.[R{x, y) A B{y)] with 


A{x)^R'{x, a),R'{x, y)^R{x, y),R'{x, y)^B{y), 


3x3y3z.ThrFan{x) A FoF{x,y) A ThrFan{y) A 

FoF{z, y) A MovFan{z) A FoF{z, Bob). <) 

This example shows that, in order to avoid the exponen¬ 
tial blow up in the number of anonymised copies, we need 
further restrictions on the ontology. In particular, in the case 
of multi-linear CQE instances we can guarantee that just one 
copy suffices for every constant. 

The following theorem formalises the intuition above. 

Theorem 14. Let 1 be a Datalog tree-shaped CQE instance. 
If I is guarded, it admits an optimal view that can be com¬ 
puted in time exponential in |I| and polynomial in data size. 


where R! is a fresh binary predicate, uniquely associated to 
the original rule, and a is a globally fresh constant not from 
a, uniquely associated to A and R. 


Theorem 18. For any ontology O we have Ecr{0) ^ O. 
Furthermore, if O is either a QL or guarded EL ontology, 
then So-(O) is a a-rewriting of O. 


Proposition 


16 


and Theorem 


18 


ensure that "H- 


r{0),V 


IS 


an optimal view for I whenever V is such a view for I' = 
(So-(C9), C, P). The transformation of O to Srj{0) preserves 
linearit y, gu ardedness, and tree-shapedness, so the results of 
Section[5T]are applicable to I'. 




























Theorem 19. Every guarded EL CQE instance admits an op¬ 
timal view that can be computed in exponential time. Every 
QL instance admits a unique optimal censor, which is imple- 
mentable by a view of polynomial size. 

6 Optimal Obstruction Censors 

Similarly to Section we start the study of optimal ob¬ 
struction censors with its limitations. The following exam¬ 
ple shows that such a censor may not exist even if we restrict 
ourselves to ontologies with only one rule. 

Example 20. Consider a CQE instance with ontology 
{R{x, y)f\A{y) A{x)}, dataset {i?(a, a), A(a)}, and pol¬ 
icy A{a). Let Qn, n > 0, be a family of Boolean CQs 

3x. R{a, Xi) A R{xi,X 2 ) A- • -A R{Xn-l, Xn) A A(Xn)- 

With the help of the ontology each of discloses the pol¬ 
icy. Thus, each Qn should entail a Boolean CQ in any optimal 
obstruction. Consider the set of all CQs that are entailed by 
queries Qn but not equivalent to any of them. On the one 
hand, this set is “harmless”, than is, any obstruction censor 
should answer all these queries positively. On the other hand, 
the CQs Qn do not entail each other. Hence, any optimal ob¬ 
struction should contain a CQ equivalent to each Qn, which 
is however not possible, because n is unbounded. O’ 

We leave the question of decidability of checking the ex¬ 
istence of an optimal obstruction for a CQE instance open. 
Answering this question positively would imply a solution 
to a long-standing open problem. In Appendix we pro¬ 
vide a reduction from the problem of uniform bounded¬ 
ness fo r binary Datalog, for which the decidability is un¬ 
known jMarcinkowski, 1999) , to the existence problem of 
optimal obstructions for Datalog CQE instances. In the rest 
of the section we give a characterisation of optimal obstruc¬ 
tions for Datalog instances in terms of resolution proofs and 
identify restrictions for which the characterisation guarantees 
existence of such obstructions. 

6.1 Characterisation of Optimal Obstructions 

We first recall the standard notion of SLD resolution. 

A goal is a conjunction of atoms. An SLD resolution step 
takes a goal f3 A p with a selected atom jd and a sentence r 
that is either a Datalog rule ^ 5 or fact 5, and produces 
a new goal p9 A tpO, where 0 is a most general unifier of /? 
and S (assuming that is empty in the case when r is a fact). 
An (SLD) proof of a goal Gq in a Datalog ontology O and 
dataset D is a sequence of goals Gqj Gi,..., G„, where G„ 
is empty, and each Gi is obtained from Gi_i and a sentence 
(rule or fact) in G U D by an SLD resolution step. 

Resolution is sound and complete: for any Datalog ontol¬ 
ogy O, dataset D, and goal G (such that G U D is satisfiable) 
there is a proof of G in G and V if and only if G U D ^ 3*G 
for the existential closure 3*G of G. 

We next characterise optimal obstructions using SLD 
proofs. Intuitively, if an obstruction censor answers posi¬ 
tively sufficient number of Boolean CQs 3*G for goals G 
in a proof of a policy, then a user could reconstruct (a part 
of) this proof and compromise the policy. Also, there can 
be many proofs, and a user may compromise the policy by 


reconstructing any of them. Thus, to ensure that a censor 
is confidentiality preserving, we must guarantee that the ob¬ 
struction contains enough CQs to prevent reconstruction of 
any proof. If we also want the censor to be optimal, the ob¬ 
struction should not block too many CQs. As we will see later 
on, these requirements may be in conflict and lead to an infi¬ 
nite “obstruction”. Next definitions formalise this intuition. 

Definition 21. Let I = (G, D, P) be a Datalog CQE in¬ 
stance, Q be the set of all Boolean CQs 3*G for goals G 
in proofs of P ( a) in G and D for some tuple of constants a, 
and S be a maximal subset ofQ such that G U S ^ P{a) for 
any a. Then, a pseudo-obstruction/or I is a subset ofQ\S 
that contains a CQ Q' for any Q in Q\S with Q |= Q'. 

The next theorem establishes the connection between 
pseudo-obstructions and optimality. 

Theorem 22. Let 1 be a Datalog CQE instance. 

1. If T is a finite pseudo-obstruction for I, then \JQ is 
an optimal obstruction for I. 

2. If each pseudo-obstruction for I is infinite, then no optimal 
obstruction censor for 1 exists. 

This theorem has implications on the expressive power of 
obstructions. In particular, we can now extend the result in 
Theorem[^ which applies to censors that are not necessarily 
optimal, to capture also optimality. 

Theorem 23. There is a CQE instance, which is both RE and 
EL, admitting an optimal view, but no optimal obstruction. 
Conversely, there exists an RL CQE instance that admits an 
optimal obstruction, but no optimal view. 

6.2 Linear Datalog and QL 

We now show how to apply resolution-based techniques to 
compute optimal obstructions for linear Datalog CQE in¬ 
stances and then adapt the results to QL. In fact, we can 
guarantee not only existence of optimal obstructions for such 
instances, but also uniqueness and polynomiality of corre¬ 
sponding censors. 

Our solution for linear Datalog instances is based on the 
computation of the set Q of existential closures of goals in 
the proofs of policies. However, since all the rules in the on¬ 
tology are linear and the body of the policy is an atom (recall 
that the rule corresponding to the policy should be linear as 
well), each of these goals consists of a single atom, except 
the last goal in each proof, which is empty. There are only 
polynomial number of such atoms (up to renaming of vari¬ 
ables). So, all the proofs can be represented by a single finite 
proof graph with atoms and the empty conjunction (denoted 
by T) as nodes, and SLD resolution steps as edges. This is 
illustrated by the following example. 

Example 24. Consider a CQE instance with ontology 

{Likes{x, y)^Movie{y), Likes{x, y)^MovFan{x)}, 

dataset LiA:es(John, Seven), and policy MovFanQohn). A 
fragment of the proof graph is given in Eigurej^ 0 

Using proof graphs we can compute optimal censors. 

Theorem 25. Let I = {0,V, P) be a linear Datalog CQE 
instance, and let S be the set of all nodes in the proof graph of 




MovFan{John) 
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Movie{Seven) 

\ 

Likes{x, Seven) 

I 

^ T ^- 


MovFan{x) 


Movie{y) 
Likes{x, y) 


Figure 2: Fragment of proof graph from Example [24] 


OUT) on the paths from facts P{a) with any tuple of constants 
a to T. Then, the Boolean UCQ 


^ = V 


GgS\{T} 


3*G 


is an optimal obstruction computable in polynomial time, and 
ocens^ is the unique optimal censor for 1. 

Example 26. For the instance in Example there is 
only one path in the proof graph from the policy to T, 
and S = {Mot;Fan(John), Ezfces(John, y), T}. Thus, 
moz;ieFan(John) V 3?/.Lifces(John, y) is optimal. 0 


Einally, note that the transformation of a QL ontology O 
to an RE ontology 2^(0) given in Definition [T? preserves 
linearity of rules. Hence, Proposition 18 with Theorem 25 
yield the following result. 


Theorem 27. Every QL CQE instance admits a unique opti¬ 
mal censor based on an obstruction that can be computed in 
polynomial time. 


(2013 I with a wide range of new results: ('i) we consider arbi¬ 
trary CQs as policies rather than just ground facts; (ii) we in¬ 
troduce obstruction censors, compare their expressive power 
with that of view censors, characterise their optimality, and 
show how to compute obstructions for linear Datalog and 
QL ontologies; (Hi) we show undecidability of checking ex¬ 
istence of an optimal view censor and provide algorithms 
for guarded Datalog and all the OWL 2 profiles. We see 
our work as complementary to Bonatti and Sauro ( 2013[ ) 
and [Studer and WerneT] ( |2014| l. The former focuses on sit¬ 
uations where attackers have access to external sources of 
background knowledge; they identify additional vulnerabil¬ 
ities and propose solutions within the CQE framework. The 
latter focuses on meta-properties of general censors that, in 
contrast to ours, can also provide unsound answers or refuse 
queries. 


8 Conclusions 

We studied CQE in the context of ontologies. Our results pro¬ 
vide insights on the fundamental tradeoff between accessibil¬ 
ity and confidentiality of information. Moreover, they yield 
a flexible way for system designers to ensure selective access 
to data. In particular, we proposed tractable view based solu¬ 
tions for CQE instances with tree-shaped and linear Datalog 
and QL ontologies, and tractable obstruction based solutions 
for linear Datalog and QL ontologies. Our solutions can be 
implemented using off-the-shelf query answering infrastruc¬ 
ture and provide a starting point for CQE system develop¬ 
ment. 


7 Related Work 

The formal study of privacy in databases has received signif¬ 
icant attention. CQE for propositio nal databases with com- 
plete information has been studied in I Sicherman et al, 1983 


Bonatti et ai, 1993) |Biskup and Bonatti, 2001||Biskup aiid 


Bonatti, 2004|. CQE was extended to (propositio na l) in 


complete databases in iBiskup and Weibert, 20081. Mik- 
lau and Suciu (2007 |l studied perfect privacy. Perfect pri¬ 


vacy, however, is very strict and may preclude publishing 
of a ny meaningful information when ex tended to ontolo¬ 
gies i jCuenca Grau and Horrocks, 2008 1. View-ba s ed au¬ 
thorisation was investi gated in I Ezvi et al.^ 2004[ Zha ng] 
and Mendelzon, 2005| , and Deutsch and Papakonstantinc^ 
( 2005| l analysed the implications to privacy derived from pub¬ 
lishing database views. 

Privacy in the context of ontologies is a growing area of 
research. Information hiding at the schema level was studied 


in I Konev et ai, 2009 Cuenca Grau and Motik, 2012|. Data 

privacy for EC and ACC DLs was in vestigated in | Stouppa 
and Studer, 2007 Tao et al, 2010| , and the notion of a 
privac y -preserving reasoner w as introduced in iBao et al, 
20071. Calvanese et al. ( 2012|l extended the view- based au¬ 
thorisation framework by Zhang and Mendelzon ( 2005| l to 
DL ontologies. 


An early work on non-propositional CQE is IBiskup 
and Bonatti, 2007). CQE for ontologies has been stud¬ 
ied in ijCuenca Grau et al, 2013t iBonatti and Sauro, 2013| 
Studer and Werner, 2014). We extend Cuenca Grau et al. 
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A Appendix (Proofs) 

A.l Proofs for Section |4] 

Before proving Theorem ^ we present the following notation and a lemma. Let I be a finite structure and / a function 
associating a fresh variable to each domain element of I. The query for X is the Boolean CQ defined as follows, with 
i?i,... Rn the predicates interpreted by Z; 

= f\ {ii,(/(wi),...,/(u™J) 1(7^1,..., 

l<i<n 


Given a BCQ Q, denote \Q] the structure interpreting each R, occurring in Q, with (/(rti),... f{un)) for every atom 
R{ui ,..., M„) in Q, where / maps each constant in Q to itself and each variable y to a fresh constant dy. 

Lemma 28. Let J be a finite structure and let C be a class of finite structures. Then, the following holds: 


{ZGC|Z>4,7} = {ZeC|Zh V 

Proof. Let Z G C be such that X J\ clearly, Z ^ and hence Z ^ VkgC required. Conversely, assume 

that Z G C is such that Z \=- \lJC’XJ then, there exists IC such that /C G C, /C J' and Z \= The latter implies 
that IC ^ X and hence we can deduce X ff, as required (otherwise, we would have by composition of homomorphisms that 
K, ^ ff, which is a contradiction). □ 

Theorem 9. Let I = (0,Z>, P) be Datalog and C = {Z | X finite, X ^ Ho x>}- Then, vcens^ = ocens^ iffU defines the set 

C\{ZgC IZ-gHo.v}- 


Proof. 

(<t=) Assume that U defines C \ {Z G C 
have thatZ Ho.v tffX ^ U. By Lemma 


Z ^ Ho,v}, which is equal to {Z G C | Z y^ Ho,v}- Then, for each Z G C we 
28 the following holds for each Z G C: 


X\=U iff X^ y Q’^. 

KeC.K’X'Ho.v 


( 1 ) 


Let Q{x) be a CQ, and let t G cert{Q,0,'D), which implies that ^ Ho,v and hence [Q(f)] G C. We show that 

t G vcensj^ (Q) iff t G ocens^ (Q). 

For the forward direction, assume that t G vcens}^ (Q); then, OUV ^ Q(f) and hence [(5(f)] ^ Ho,v- We can then conclude 
[(5(f)] \l kclC KXfT-Lo V (otherwise, K, [(5(^] for some in U and since we have established that [(5(f)] ^ Hoy 
and homomorphism compose we would have 1C ^ Hoy which is a contradiction). But then. Equation 0 implies that 
[(5(f)] Lf and by the definition of obstruction-censor that t G ocens^ {Q), as required. 

For the backward direction, assume now that f G ocensj^ [Q). Then, by the definition of obstruction censor we have [(5(f)] 

U. By Equation 0 we then have [(5(f)] ^ VkgC K-Xtio v L^mmaj^ immediately implies that [(5(f)] ^ {Z G C | Z y^ 
Tdoy}- Erom this, we must conclude that [(5(f)] G {Z G C | Z ^ Ho,v} and hence [(5(f)] ^ Hoyi which implies 
O U V 1= (5(f) and t G vcens[[^ (Q), as required. 

Assume that ocens^ = vcens}^. To show that U defines {Z G C | Z y^- Ho,v}, we prove that X\=UiffXf^ Hoy for 
every structure Z in C. If Z ^ Hoy and Z |= U, then ocens^ {Q^) = False. Since ocens^ = vcens^, we also have that 
vcens^ {Q^) = False and hence O UV ^ Q^. Consequently, Z y^- Hoy, as required. If Z y^ Hoy, then O UV ^ Q^\ 
consequently, vcens^(Q^) = False. Since ocens^ = vcens}^, we have ocens^(Q^) = False and hence, since Z ^ Hoy, 
we necessarily have X \=U. □ 

Theorem 10. There is a Datalog CQE instance admitting a confidentiality preserving view censor that is not based on any 
obstruction. Conversely, there is a Datalog CQE instance admitting a confidentiality preserving obstruction censor that is not 
based on any view. 

Proof. Eirst we illustrate that obstruction censors cannot always simulate view censors. Consider CQE instance I = (0, D, 0), 
where D represents an undirected graph with nodes “green” g and “blue” b, which are connected by edge in all possible ways: 

D = {edge{g,b),edge{b,g),edge(b,b),edge{g,g)}. 

Clearly, D entails every Boolean CQ over the edge relation and thus every graph can be homomorphically embedded into D. 
Consider V = {edge{g, b), edge{b, y)}. Since the ontology is empty, H(i,y = V and {Z | Z is finite, Z ^ Hoy, and X 




is the class of all graphs that are not 2-colourable. It is well-known that this class of graphs is not first-order definable and hence 
cannot be captured by a UCQ. 

Next we construct an obstruction censor which cannot be simulated by a view censor. Consider the instance I = (O, I?, 0), 
where T) = {edge{a^ o)} and O consists of the single transitivity rule 

edge{x, y) A edge(y, z) — >■ edge{x, z). 

Clearly, O U V entails each Boolean CQ over the edge relation. Consider obstruction U = 3y.edqe(y, y), which defines the 
class of directed graphs with self loops. Suppose that some view V realises ocens^. By Theorem^ the obstruction U must 
define {I G C 11 Tdoy} where C is the class of all directed graphs. Thus, any graph G must satisfy the property 

G has no self loops iff G ^ Tdoy- 

Due to the rule in O, we conclude that V is a DAG, that is, it has no edpe-loops. Take a DAG G extending (a graph isomorphic 
to) Hoy with a new node v and edges connecting all its sink nodes to v. Clearly G has no self loops, but G >4 Hoy, which 
is a contradiction. □ 

A.2 Proofs for Section |5] 

Theorem 12. The problem of checking whether a Datalog CQE instance admits an optimal view is undecidable. 

Proof. The proof is by reduction from the following problem: does a deterministic Turing machine without a final state have a 
repeated configuration? This problem is undecidable by Rice’s Theorem. 

Formally, for every such Turing machine M = (F, Q, gQ, <5) with F a tape alphabet, which include the blank symbol 0, Q a 
set of states, go G Q an initial state, and 5 iF x Q—^-Fx (Q\{go}) x {+)~}a transition function, we construct an Datalog 
CQE instance 1m = {O, T>, P) such that it admits an optimal view if and only if M starting on the empty tape has a repeated 
configuration. The notion of configuration is as usual—it is the content of the tape and the head pointer to a cell on the tape. 
Note that the transition function 6 is defined is such a way that the initial state does not appear in a computation anywhere 
except the initial configuration. This clearly does not affect the undecidability of the problem. We also assume, that the tape of 
the machine is infinite in both directions, and all of it can freely be used for computations. 

We start the construction of 1m from the dataset T). It uses only one constant a and consists of three binary atoms 

i?(a, a), S{a, a),T(a, a). 

The predicate T is intended to point to the next cell on the tape, the predicate S points to the same cell in the following 
configuration, and the predicate R is responsible for initialisation. We start the definition of the ontology O with the description 
of the role of R. Let O contain rules 


R{x, x) —>■ I{x), (2) 

R{x,y) A R{y,z) ^ R{x,z), (3) 

R{x,y) M{y) ^ I{x). (4) 

As we will see formally later, these rules guarantee that if 1m admits an optimal view, then this view contains the fact /(a). 
This fact initialises the tape by means of the following rules (conjunction in heads is just a syntactic sugar): 

I{x) -A I'^{x) A I~{x) A Gqg{x) A Ao(a;), (5) 

I'^{x) A T{x,y) -A I^{y) A Cii,{y) A Ao{y), ( 6 ) 

I~{y) A T{x,y) -A I~{x) A C(i){x) A Ao{x). (7) 


In these rules Gg^ is a unary predicate indicating that the head is pointing to the first cell and the state is qq. For each other state 
g in Q the vocabulary contains the corresponding predicate Cg. The rest of the tape should always be marked by predicates 
G$ indicating that the head does not point to this cell. Similarly, if in some configuration a cell contains an alphabet symbol 
g G F, then this is indicated by the predicate for example, the rules above ensure that the tape is initialised by the symbol 
0. To ensure the consistency of the computation grid, constructed by means of tape and time predicates T and S, the ontology 
contains the rules 


T{x,y) AT{z,u) A S{y,u) ^ S{x,z), (8) 

T{x,y) AT{z,u) A S{x,z)S{y,u). (9) 


Finally, we need to make the adjacent configurations consistent. In particular, the content of each cell, as well as the fact that 
the head is pointing to this cell in some particular state, that is, the cell’s Cg and Ag labels, is completely defined by the labels 



of the three cells in the previous configuration. So, abbreviating T{x, y) A T{y^ z) A S{y, u) A Ag- (x) A Ag{y) A Ag+ (z) by 
^{x, y, z, u), the ontology O contains the rules 

ip{x,y,z,u)AC^{x)ACq{y)AC^{z) -A CQ{u)AAg,{u), for all p", 3 + G T, if S{g,q) = {g',q',d) for some q',d, 
ip{x,y,z,u)ACq{x)ACq,{y)ACq,{z) -A Cq,{u)AAg{u), for all 5 , 3 + G T, if 5{g-,q) = {g',q',-) for some 5 ', g', 

ip{x,y,z,u)AC,D{x)AC,D{y)ACq{z) -A Ciii{u)AAg{u), for all G T, if S{g+,q) = {g',q\+) for some 5 ', g', 

ip{x,y, z,u)ACq{x)ACiii{y)ACiii{z) -A Cq>{u)AAg{u), for all 5 , 3 + G T, if 6{g-,q) = {g',q',+) for some 5 ', 

(p{x,y,z,u)AC,li{x)AC^{y)ACq{z) -A Cq>{u)AAg{u), for all 5 , 3 " G T, if S{g+,q) = {g',q',-) for some 5 ', 

ip{x,y,z,u)ACq,{x)ACq,{y)ACq,{z) -A Cq,{u)AAg{u), for all p", g, 5 + G T. 

Having the ontology defined, we complete the construction with specifying the policy. It consists of several BCQs, but the 
translation to a single CQ by means of several rules in the ontology is straightforward. The policy P guarantees that a cell 
cannot contain several alphabet symbols, the machine cannot be in several states, and the head cannot simultaneously point and 
not point to a cell. This is formalised as the following set of BCQs: 

3x. Ag{x) A Agi(x), for all 5 , 5 ' G T such that g ^ g', 

3x. Cq{x) A Cq'{x), for all q,q' £ Q U {0} such that q ^ q'. 

Completed the construction, next we formally prove that M has a repeated configuration if and only if 1m has a (finite) 
optimal view. We start with forward direction. 

(^) 

Let the first pair of repeated configurations of M have numbers m and n, while the smallest (non-positive) number of a cell 
whose content was changed during the computation is fc + 1 , and the biggest (non-negative) such number is f — 1 (we assume 

that initially the head is pointing to the cell number 0). Note that k and i are finite, because a computation cannot use infinite 

number of cells in finite number of steps. In fact, k > —n and £ < n. 

The view V makes use of constants with —l<i<n and k < j < £, such that ago = a and all others are anonymous 
copies of a. By means of binary predicates S and T these constants form a grid, that is the view contains atoms 

S{a(i-i)j,aij), for all 0 < i < n, k < j < £, 

r(ai(j_i), Oij), for all — 1 < i < n, /c < j < £. 

The grid is “folded” on all the sides, in the configuration number i = —1 and cells number k and £ by means of self loops, and 
on repeated configurations m and n: 

S'(a(_i)j, a(_i)j), for all k < j < £, 

T{aik,aik), for all -1 < I < n, 

T{aii, an), for all —1 < i < n, 

^^•mj)i fttr all k "A j "A £. 

Each configuration with number 0 < i < n with the word gk ■ ■ ■ ge written on the part of the tape with cell numbers from k to 
£, the state q, and the head pointing to the cell number h is represented by means of the following facts: 

Ag. (aij), for all k < j < £, 

fAqi^aifi), 

C(i^{aij), for all k < j < £,j h. 

The auxiliary “configuration” number —1 is the same as a usual configuration with the empty tape, except that the head does 
not point anywhere: 

^o(a(-i)j ), for all k < j < £, 

C' 0 (a(-i)j), for all k < j < £,j ^ h. 

The constant a is in the initialisation predicates: 

R{a, a),I{a). 

Finally, each configuration with number —l<i<n (i.e., including the auxiliary one) has cells with numbers k' and £' such 
that all the cells between k and k', as well as all the cells between £' and £ contain 0 and do not have the head pointing on them, 
the first group is marked by and the second by /+: 

I~{a(^ij), for all k < j < k', 

I'^ia{ij), for all £' < j < i. 

It is straightforward to see that V \= O and O U V ^ P, that is, 'L is a confidentiality preserving view for I^- Also, it is a 
matter of technicality to check that the view is indeed optimal. 

(A=) 



Next we show that if the machine M does not have a repeated configuration, then there is no optimal view for the instance 
1m- Assume for the sake of contradiction that such a view V exists. Without loss of generality we may assume that V \= O. 
The first fact we need is the following claim. 

Claim 29. The view V contains the atom I{a). 


Proof. Whatever is the shape of V, it entails the BCQs 

Qf = 3xi... 3xi. R{a, xi) A R{xi,X 2 ) A • • • A R{xi-i,Xi) for all i> 1. 

Since i is unbounded, but V is finite, there exists io such that there is a homomorphism from the body of to V which sends 
different Xj and Xk to the same constant. This means that there is an i?-loop of some length in V, which is connected by an 
i?-chain from a. By the rules (|^-Q this implies that I (a) is a fact in V. □ 


Similarly to the proof of the claim above, whatever is the shape of V, it entails the BCQs Whatever is the shape of V, it entails 
the BCQs 

Qf = 3xi ... 3xi. S{a, xi) A a; 2 ) A • • • A S{xi-i,Xi) for all i>l. 

Since V is finite, this implies that there is the (finite) biggest number n — 1 such that the body of has a homomorphism to 
V which sends different Xj to different constants. 

Consider now a “grid” BCQ that consists of the following atoms; 

S(x(j__i)j,Xij), for all 0 < i < n, — n < j < n, 

T{xi(^j_i'),Xij), for all 0 < z < n, —n < j < n, 
xqo = a. 


This query is also “harmless”, that is, should be entailed by V whatever is its shape. Since this BCQ has a chain of S starting 
from a of length greater than n — 1, for any homomorphism from the body of to V there are numbers k and £ such 
that this homomorphism sends Xko and x^q to the same constant. Let h be such a homomorphism, and k, I be the numbers 
corresponding to h. By rules ([^ and (j^ we have that V contains atoms 

S{x(£_i'fj,Xkj), for all —n < j < n. ( 10 ) 


On the other hand, by the fact that I (a) is in V and the rules ©-((^ we have that the constants h{xoj) for —n < j < 
n represent the part of the initial configuration on cells with nuniBer^rom —n to n. Furthermore, by means of the rules 
corresponding to the transition function of the machine, the constants h{xij) form the part of the configuration with number 
i for all 0 < i < £. By the same rules and atoms (lOi we conclude that the constants h(xkj) represent not only the part of 
the configuration number k, but also the part of the configuration number i. If these parts are different, then this discloses the 
policy, so they are the same. But the rest of the configuration, that is the content of the tape beyond the cells with numbers from 
—n to n, is also the same for the configurations, because they are just full of symbols 0 (the head cannot reach this part of the 
tape because it is too far). So, we come to the fact that M has a repeated computation, which contradicts the precondition. □ 


Proposition 30. The censor vcens}^ based on a view V is confidentiality preserving for a CQE instance I = (O, T>, P) if and 
only if O U V ^ P{s) for each s € cert{P,0,'D). Additionally, it is optimal if and only if for each CQ Q{x) and each 
t £ cert{Q, O, V), the fact that OCV C {Q(f)} ^ P{s) for any s G cert(P, O, V) implies that t £ cert(Q, O, V). 

Proof. Assume that OUV ^ P(s) for each sG cert(P, O, P). Trivially, OUV ^ Th^^-g^sV and hence we have OUTh^ggnsJ' 
P{s) for each s G cert(P, O, T>), as required. 

Assume now that cens is confidentiality preserving, in which case O U Th^gg^s'^ for each s £ cert(P, O, V). Next, 

assume for the sake of contradiction that OUV \= P{s) for some s £ cert(P, O, T>). Since O U P |= P(s), by the definition of 
policy we have that vcens^ (P{^) = True and thus P{s) £ Th,,ggg5v; therefore, OUTh^ggn^v |= Pis), which is a contradiction. 

We next focus on the optimality statement. Assume that O U V U {Qit)} ^ P(s) for any s £ cert(P, O, V) implies that 
t £ cert(Q, O, V), while vcensj^ is not optimal. Then, there is a confidentiality preserving censor cens that extends vcens^; this 
means that for some CQ Q(x) and, t £ cert{Q, O, T>) we have t £ cens(Q), but t ^ vcensj^ (Q). The fact that t ^ vcens^ (Q) 
and t £ cert(Q, O, V) implies that t ^ cert((5, O, V). Furthermore, the fact that cens is confidentiality-preserving implies that 
O U Thgens U {Q(f)} ^ Pi^ for any s £ cert(P, O, T>). But then, since cens extends vcens^, we have that Th,,ggggV C Thgens 
and hence O U Th^gg^^v U {Qii)} y= P(^, and therefore t G vcensj^(Q), which is a contradiction. 

Finally, assume that there exists some CQ Qix) and t £ certiQ,0,V) such that O UV U {Qii)} ^ Pi^ for each 
s £ cert(P, O, V), but O LiV ^ Qii)- Then, we can define a censor cens that behaves exactly like vcens^, with the exception 
of answering Q(f) positively. Thus, Thgens = Th^^gg^^v U {Qii)}. But then, since O UV U {Qii)} Pi^ for each s £ 
cert(P, O, V) and O U V 1= we have that G U Thcens ^ which implies that cens is confidentiality preserving 

and is not optimal, as required. □ 



We say that a rule is normalised if it has at most two atoms in its body; an ontology is normalised if it is a set of normalised 
rules. Clearly, any guarded ontology can be normalised. 

Definition 31. Let T, be a signature, O an ontology over S, and a subset S of Y, is a set of unary predicates. S is closed under 
O if(i) O U I A £ ^ C!{x) implies that C G S and (ii) if A does not occur in O, then A G S. 

Theorem 14. Let Ibe a Datalog tree-shaped CQE instance. If\ is guarded, it admits an optimal view that can be computed in 
time exponential in |I| and polynomial in data size. //I is multi-linear, it admits an optimal view that can be computed in time 
polynomial in |I|. Additionally, I has a unique optimal censor if it is linear. 


Proof. 

Guarded, tree-shaped CQE instance. Algorithm presents a procedure that builds a view for a given CQE instance I = 
(O, V, P). We are going to show that if I is tree-shaped and guarded, then the algorithm returns an optimal view for I. By its 
construction, the constructed dataset V is safe, so it remains to prove its optimality. Due to Proposition [30] it suffices to show 
that for each CQ Q and a tuple t such that t G cert((5, O, V): 

if O U V U [(5(f)] ^ [f^(^] for each s G cert(P, 0,1)), then t G cert((5, 0,V). (11) 

Observe the following. 

(Ol) W.l.g. we can assume that V n 'Ho,v C [Q(f)]. 

(02) If LL is as defined in Algorithmj^ LLoyD ^ H and LL \ T-Lo.v consists of unary atoms only over fresh predicates introduced 
into at Line 1. 

(03) No rule of Oe can be applied to V. 

Assume that (5(f) satisfies the “if’-clause of Equation ( fTTj l. Since by the assumption t G cert((5,0, D), then there is a 
homomorphism h from LLq [Q(t)] T~^0,'D- E is easy to see that 

^ ^ '^o.-D Afh : 'Hoe.'d- 

We are going to use the following notations. 

• Denote as R 

• Let A be a dataset and d an element occurring in X. Then we define the set concx{d) as {A \ A{d) G X}. 

We are going to show the existence of a homomorphism g : B —t V, which would prove that Q satisfies the “then”-clause 
of Equation ( [TT] i. Let di,... ,dm be all the fresh constants from [(5(^], let d be an element from [(5(^], and let fi be a 
homomorphism from B into H. We claim that there exists g that satisfies the following properties: 

1. If d is from OUT), then g{d) = d. 

2. Let d = di and h{di) = a. Then g{di) = a' such that a' G CTo, a' f a and conc 5 (a') = concB(d), where cja is a set of 
all “copies” of a introduced by the algorithm (for example, see sub-routines in Algorithm]^. 

It remains to show that g does indeed exist and map B into V. To this end, we need to show that 

1. for each element d from [(5(^], there is an element a' in V satisfying the second property of g, and 

2. for each binary atom i?(di, c? 2 ) G [Q{i)], there exists a corresponding binary atom R{g{di), 3 (^ 2 )) € V. 


The former requirement follows from the construction of V. The latter one requires that 


cert{P,OE,VUg{B)) = i!l. (12) 

Note that EquationfUjimplies that cert(P, Oe, VUB) =0. Also observe that no rule from Oe is applicable toVU B. Indeed, 
no rule is applicable to V nor to B by construction. Assume that a rule r is applicable toVUB. If the body of r contains one 
atom, then we immediately obtain a contradiction. If the body of r contains two atoms then there exist an atom fiGV and an 
atom /2 G P such that /i A /2 is an instantiation of the body of r. Assume that the atom in the body of r corresponding to /i 
is a guard of the rule; then all constants occurring in /2 occur in /i too. Since B and V share only “active” constants (i.e., the 
ones from I), we have that /2 G V H P (due to Observation (Ol) 1, and thus r is applicable to V, which gives a contradiction. 

Assume that Equation |T^ does not hold. Hence, there is a rule r £ O applicable to V U g{B). Recall that r is not applicable 
to V. We have the following cases depending on the shape of r. 


1. r is of the form A{x) —>■ C{x), A{x) A B{x) —>■ C{x), or A A B{x) —>■ C{x). Clearly, in this case r is applicable to 
g{B). It is easy to see that r is then applicable to B since concg((i) = concv(p(d)) for every d in B, which contradicts 
the observation above. 


2. r is of the form R{x, y) Head{x) or A A R{x, y) Head{x), where Head{x) is of one of the following forms for 
some unary C of binary Q predicate: C{x), C{y), Q{x, y), or Q{y, x). Here we obtain a contradiction similarly to the 
previous case. 



3. r is of the form R{x, y) A A{x) —>■ Head{x). There are three cases. 

(a) There are a, b, and di such that i?(a, b) G V and A{di) G B, where g{di) = a. Since r is not applicable to B, then for 

any element c occurring in B, it is the case that R{di,c) ^ B. Thus, dii{di) ^ B and consequently 6ii{a) ^ V. The 

latter statement contradicts the assumption that R{a, b) G V. 

(b) There are a, b', and di such that R{di, b') G B and A{a) G V, where g{di) = a. Since r is not applicable to B, then 

A{di) ^ B and thus A{g{di)) ^ V. This contradicts that A{a) G V. 

(c) there are a, b, b', di, and dj such that R{di,b') G B, A{dj) G B, g{di) = g{dj) = a, and g{h') = b. Then we 
conclude that coiicig{di) = concig{dj) and consequently A{di) G B. If Head{di,b') is equal to C{di) or C{b') 
for some unary predicate C, then C G concig(di) or C G concg(b'), respectively, and thus C(g(di)) G V or 
C{g{b')) G V, respectively. If Head{di, b') is equal to Q{di, h') for some binary predicate Q, then 5q G concs(di) 
and pQ G concs(b'), and thus SQ{g{di)) and PQ{g{b')) are in V; therefore, CheckRole sub-routine of the algorithm 
would return True on input {Q(g{di), g{b')),V), and thus Q{g{di), g{b')) G V. Anyway, the obtained contradictions 
conclude the case. 


4. r is of the form R{x, y) A A{y) -G Head{x). This case is analogous to the previous one. 

Finally, g{f) should be in V for each binary atom f G B, since (i) Equation ( [T^ holds and (ii) binary atoms that do not 
discover the policy were exhaustively added to V. 

Regarding the size of the V, if a is a constant occurring in I and C a set of unary predicates A such that A{a) G H, then 
the number of “copies” of a added by the algorithm is equal to a number of subsets of C closed under Oe (see Algorithm]^. 
Clearly, this number is exponential in \0\ and polynomial in \'D\ (see Definition 311. 


Multi-linear, tree-shaped CQE instance. Let a DPI I = {O, V, P) be such that O is multi-linear Datalog. Let V be a dataset 
returned by Algorithmic For every constant a, the set Ua contains the constant such that A* is a maximal subset of 
{A I A{a) G 'Ho,v} closed under Oe- It is easy to check that the number of such subsets is polynomial in the size of O. The 
set A* is a maximal set of labels (i.e., unary predicates) among all constants in aa, i.e., if a' G a a, then {A \ A{a') G V} C A* 
for some A*. We will also denote as a* an element of such that concv(a*) = A*. 


Let 6 be a constant from I and let a' be from aa such that R{a, b) is in HoeP- Since I is multi-linear, O does not include 
rules with bodies of the form R{x, y) A A{x) and thus whatever unary atoms a' participates in, they cannot affect the atoms 
b participates in. Hence we conclude that (i) if R{a',b) is in V for some o' G aa and b from I, then so is R{a*,b) for a 
corresponding element a* from aa', (ii) if R{a', h') is in V for some o' G aa and b' G ab, then so is R{a*, b*) for corresponding 
elements a* and b* from aa and ab, respectively. Let V* be a subset of V which is based on constants a from I and their 
copies a*. Clearly if, for some CQ Q{x) , a G cert((5, 0, V) and a G cert{Q, O, V), then a G cert{Q, O, V*), which proves 
optimality of V*. 

The polynomial size of V* follows from the observation that the sub-routine AddUnPredicates introduces only linearly 
many copies of a constant a for each set of labels, including A*. 


Linear, tree-shaped CQE instance. Finally, assume that O is linear. Then, there is the unique maximal subset Vo of HoeP 
such that cert(P, Oe, Vo) = 0, which gives the uniqueness of V. □ 


Proposition 16. Let I = (0,7), P) be a CQE instance over constants a with P tree-shaped, and O' a a-rewriting of O s.t. 
O' 1= O. IfV' is an optimal view for I' = (O', P, P), then Ro' ,V' i^ optimal view for I. 


Proof First we show the confidentiality preservation of the censor. Since vcensp is confidentiality-preserving, we have that 
O' CV ^ P{^ for each s G cert(P, O, D). Since O' is Datalog, it is clear that Ro',v = Ro’y', thus. O' U V ^ 
for each s G cert(P, 0 ,P). But then, since P is tree-shaped and O' is a rewriting of O we have O U V ^ P{^ for each 
s G cert(P, O, V) (see | Stefanoni et al, 2013)), as required. 


Now we concentrate on the optimality of the view. Assume by contradiction that vcensj' is not optimal, then, by Propo¬ 
sition 30 there exists a BCQ Q such that (i) O U P ^ Q; (ii) O U V ^ Q; and (Hi) O U V U {Q} ^ -P(s) for each 
s G cert{P,0,P). Since O CP \— Q and O' ^ O we have (iv) O' U P |= Q. Furthermore, condition (Hi) implies that 
OCV C\Q\ Y=- P{s) and since P is tree-shaped and O' is a rewriting of O we have O' CV C[Q] 'f= P{^, which by the fact 
that V ^ V' then also implies that (v) O' CV' C {Q} ^ P{^ for each s G cert(P, 0,P). But then, (iv) and (v) and the fact 
that V' is optimal for I' we must have O' U V |= Q. Since V = Ro',V' we have V \= Q, which contradicts (ii). □ 


A.3 Proofs for Section]^ 

For the sake of ease in the proofs for theorems and propositions of this section we will consider only the class of BCQs with 
constants. Clearly, any results obtained for this class will also hold for the class of all CQs. Before proceeding to the main 
proofs, we introduce few definitions and lemmas. 







Let O be a Catalog ontology and V a dataset; let Q' be a possibly infinite set of queries such that O U V |= Q for each 
Q G Q'. Then a censor censQ' is defined as follows: 

censQ'(Q) = True iff cert(Q, 0,V) = True and [Q] ^ Q' for each Q' G 


Lemma 32. Let I = (0,7?, P) be a CQE instance; let T be a pseudo-obstruction based on a subset S o/Q. Then, censx = 
censQ^g. 

Proof. Let Q be a CQ such that cert((5, 0, V) = True. 

Assume that censQ\s((3) = False; this yields that [Q] |= Q' for some Q' € Q \ S. Then there exists Q" G T such that 
Q' 1= Q" thus [Q] ^ Q", i.e., censT(Q) = False. 

Assume that censx(Q) = False; this yields that [Q] ^ Q" for some Q" G T. Note that Q" G Q \ S since T C Q \ S and 
thus censQ\g((5) = False. □ 


The lemma above allows us to speak of obstruction censors in terms of either T or Q \ S, whatever way is more convenient 
to show the required results. We are going to show now that a censor cens is optimal for a given CQE instance I iff there exists 
a maximal subset S of Q such that cens = censQ^g. But first we need the following notion of a normalised proof. 

Definition 33. Let O be a Datalog ontology, 7? a dataset, and Gg goal. A proof tt of length n of Gq in O U 7? is normalised 
if there is k < n such that Vi G O for each i < k and rj G 7) for each j > k. Moreover, the number k is called the frontier of 
TT, denoted fr(7r). 

Intuitively, a normalised proof tt works as follows: first we rewrite the initial query Gg over the ontology O until we obtain 
the query Gfr( 7 r)-i that can be mapped into 7?, and then we perform such a mapping applying (r^, Oi) with i > fr(7r). Observe 
that for every Gi with i < fr(7r) it holds that G U G^ |= Gg. 

We exploit the following known result about SLD resolution over Datalog ontologies. 

Lemma 34. Let O be a Datalog ontology, let D be a dataset, and let Gg be a goal such that O yjD \= Gg. Then there exists a 
normalised SLD proof tt o/Gg in O UD. 

Lemma 35. Let I = (0,D,P) be a CQE instance with O a Datalog ontology and cens a censor for O and D. Then 
cens is optimal for I iff there exists a maximal subset S o/Q such that (i) G U S P(s) for each s G cert(P,0,D) and 
(ii) cens = censQ^g. 


Proof. We start with the “only if’-direction. Let us assume that such maximal subset S exists. We show that censQ\s is optimal. 

First, we show that censQ^g is confidentiality preserving. Assume the contrary; then, there is a (finite) subset IF of Thcens^yg 
such that G U F ^ P{s) for some s G cert(P,0,V). This yields the existence of proof tt of P(s) in G U [F], where 
[F] = Uqgf[* 5]- Due to Lemma 34 we can assume that tt is normalised with frontier fc + 1. Let G^ be the goal right before 
frontier in tt. Since tt is normalised, then Gk is proved by using only facts from [F]. So, we can write Gfc as Gk = i3i A ... A Bm, 
where each Bj is the conjunction of all atoms that are proved using facts only from a particular [Qf. Obviously, the order in 
which these Bj are proved is irrelevant, so let us assume that all Bj have been proved except for Bp, since, the different Bj 
can share variables, the remaining goal to prove may not be just Bi, but rather Bi9i, with 9i some substitution. We make the 
following observations: 


1. Bi9i does not mention any constants not in G U D. Indeed, for any distinct queries Qk, Qj in F we have that [Qk] and 
[Qj] only share constants from O CD [Qk]', thus, if Bi9i contains some constant coming from [Qj] with j i, it would 
not be possible to prove Bi9i using only facts from [Qi]. 

2. There exists a proof of P(s) in G U D such that Bi9i occurs as a subgoal. We construct such proof as follows. First, we 
can “reach” goal Gk because it only requires rules from G. Note also that each Bj follows from GUD, so we can continue 
the proof by showing all Bj except for Bi. Then, we can do it in such a way we reach precisely Bi9i as a subgoal. 

3. Qi ^ 3*Bi9i since Bi9i is provable from [Qi]. 


Observation 2 means that Bi9i G Q for all 1 < z < m. Furthermore, since the censor answers True for each Qi we have that 
Bi9i G S. But then, G U S [= P(s), which is a contradiction. 


Now we show the optimality of censQ\g. Clearly, a censor cens for I = (0,D, P) is optimal if and only if for each CQ 
Q(x) and each t G cert((3, G, D) the fact that G U Thcens U {Qit}} F’(s) holds for each s G cert(P,0,D) implies 
that G U Thcens H Q{i)- Due to this, censQ^g is optimal if and only if for each Q such that cert{Q,0,D) = True and 
G U ThcensQ^s U {Q} ^ P(s), it holds that G U ThcensQ^g N Q- Assume to the contrary that there exists a CQ Q such that 
cert(Q, G, D) = True and G U Thcens^yg U {Q} ^ P(^, but G U Thcens^yg Q- The latter means that censQ\s((3) = False, 
that is, [Q] ^ Q', for some Q' G Q \ S. Recall that for any Q G Q \ S it holds that G U S U {Q} |= F’(/) due to maximality of 
S. Observe that S C ThcensQ^g; this yields G U ThcensQ^g U {Q} ^ P{^, which contradicts the initial assumption and concludes 
the “only if’-direction. 



Now we consider the “if’-direction. Let us now assume that cens is optimal, and let Q' = {Q | cens(Q) = False}. 
Consider the following subset S of Q: S = Q \ Q'. To prove the “if’-direction, it suffices to prove the following two conditions: 
(i) S is a maximal subset of Q such that O U S ^ for s^ch s G cert(P, O, V) and (ii) censQ\§ = cens. 

To show (i), assume that O U S U {Q} ^ P{s) for some s G cert(P, O, V) and some Q G Q. Clearly, since by construction 
S C Thcens, it holds that O U Thcens U {Q} |= P(^, and therefore cens((5) = False, i.e. Q G Q', which implies (i). 

To show (ii), let us pick an arbitrary Q such that O U V |= Q but cens((3) = False and hence Q G Q'. Since cens 
is optimal, we have that O U Thcens U {Q} \— P{^ for some s G cert{P, O, V), so let F be any minimal subset of Thcens 
such that O U F U {Q} ^ Pi^- Following the same arguments as we used in the “only if” direction we have that there 
exists G G Q\S such that Q \= 3*G; since 3*G is part of the obstruction, then censQ\s(Q) = False. Finally, assume that 
censQ\s((5) = False; then, Q ^ 3*G for some G G Q\S. Since Q\S C Q, we have that cens((5) = False, as required. □ 

Theorem 22. Let Ibe a Datalog CQE instance. 

1. If T is a finite pseudo-obstruction for I, then Vgex ^ optimal obstruction for I. 

2. If each pseudo-obstruction for I is infinite, then no optimal obstruction censor for I exists. 


Proof. Let us prove Statement 1. Assume that T is a finite pseudo-obstruction. By Lemma 32 we have that censx = 

By the “only if” statement in Lemma 35 we have that censovs is optimal. But then, since T is finite, then U is an obstruction. 


Next, we show Statement 2. Assume by contradiction that each pseudo-obstruction is infinite, but there is an optimal censor 

tells us that there exists 


35 


based on an obstruction U. Since ocens}^ is an optimal censor, then the “if” direction of Lemma 

a pseudo-obstruction T such that ocens}^ = censx. We can show that then there exists a finite pseudo-obstruction which 
contradicts the assumption above. Pick any CQ Q from [/; then, clearly, ocens^ (Q) = False and hence censx(Q) = False. 
The latter implies that there exists Q' G T such that Q \= Q'. Let us now construct U' = XJq^jj Q', which is finite and also a 

“subset” of T. To obtain a contradiction, it thus suffices to show now that ocens}^ = censx. Indeed, for each CQ Q such that 
cert(Q, T>, O) = True (recall that ocens}^ = censx): 


• Assume that ocens}^ (Q) = False; then there is Q' in U such that [Q] \= Q', which yields [Q] |= Q” with Q” from U', 
and therefore ocensj^ [Q) — False. 

• Assume that ocens}^ (Q) = False; then [Q] |= Q” for some Q” in U', and consequently, since Q" G Q \ S, we conclude 
that censx(Q) = False. 


The obtained contradiction concludes the proof. 


□ 


Theorem 23. There is a CQE instance, which is both RL and EL, admitting an optimal view, but no optimal obstruction. 
Conversely, there exists an RL CQE instance that admits an optimal obstruction, but no optimal view. 


Proof. To show the first statement, consider Ii = (Oi,I?i, Pi), where = {P(a, a), A(a)}, Pi = A(a), and the guarded 
RL (and EL) ontology Oi = {R{x, y) A A{y) -G A{x)}. Since this CQE instance is guarded and tree-shaped, by Theorem 14 
we can devise an optimal view. No optimal obstruction, however, exists, which is shown in Example [20| 

To show the second statement, consider CQE instance I 2 = i02-,'P2, P 2 ), with P 2 = a) j7^2 = A[a), and O 2 = 

{P(a:i, y) A P(x 2 , y) ^ Xi Pi X2, R{x, y) -G A{y)}. Erom | Cuenca Grau et ai, 2013) we know that no optimal view exists 
for this instance, and the proof can be easily extended to our framework (note that our notion of a censor vcens}^ based on a 
view V differs from the one in I Cuenca Grau et ai, 20131 ) extends also to the case where views are not required to be sound. 
However, U = A{a) V 3x. R{x, a) is an optimal obstruction, since there is only one proof of A(a) with subgoal R{x, a). □ 


Theorem 25. Let I = (O, P, P) be a linear Datalog CQE instance, and let S be the set of all nodes in the proof graph of 
O UD on the paths from facts Pfa) with any tuple of constants a to T. Then, the Boolean UCQ 


U = \/ 3*G 

V GgS\{T} 

is an optimal obstruction computable in polynomial time, and ocens}^ is the unique optimal censor for I. 

Proof. Optimality and uniqueness follows from Theoremj^and the facts that (i) the set S is exactly Q (ii) the only maximal 
subset S of Q such that O U S does not entail any P(^s) is the empty set. To prove the former fact, first observe that any goal 
that can appear in any SLD proof in O U P is isomorphic to one of the nodes of the proof-graph of O U P; then Eact (i) follows 
directly from the construction of the proof-graph. Eact (ii) follows from the observation that each SLD proof in case of linear 
O is normalised, and therefore for each Q G S it holds that O U Q \= P{s) for some s G cert(P, O, P). 

Einally, polynomiality follows from the fact that in linear Datalog the size of the proof-graph is at most cubic in jO U P|. □ 

Theorem 27. Every QL CQE instance admits a unique optimal censor based on an obstruction that can be computed in 
polynomial time. 










Proof. Let I = (O, I?, P) be a CQE instance with O in QL. Let cens' be the o ptim al censor for I' = (^cr(O), V, P), where cr 


is a set of constants of I and is a linear Datalog ontology. By Theorem 25 cens' = ocensp for the UCQ U as defined 


in the theorem. Let cens = ocensj^. 


We are going to show that cens is an optimal censor for I. 


Confidentiality preservation. Assume that cens is not confidentiality preserving for I, that is, O U Thcens |= P{^ for some 
s S cert(P, O^V). This means that there exist Qi, ..., G Thcens such that OU{Qi, ..., Q^} |= clearly, OUV |= Qi 
for each i G n}. By Proposition [Ts] So-(O) \= O and consequently Ea-{0) yjV \= Qi for each i G n}. Since 

cens' is confidentiality preserving for I^we conclude that {Qi ,..., Qn} % ThcensS so there is j G n} such that 

cens'(Qj) = False; i.e., [Qi] ^ U. The last entailment implies that cens(Qj) = False, i.e., Qj ^ Thcens, which yields a 
contradiction and thus cens is confidentiality preserving for I. 


Optimality. Assume, for the sake of getting a contradiction, that cens is not optimal for I, that is, there exists Q such that 
(i) OUV \= Q, (ii) Q ^ Thcens, and (Hi) O U Thcens U {Q} y=- Pi^ for each s G cert(P, 0,1)). This yields [Q] |= u for some 
disjunct rt in 17 and consequently cens'((5) = False. Note that for each disjunct u in U, it holds that U {u} |= P{^ 

for some s G cert(P, O, P); thus 5^(0) U {Q} ^ P{^. There are the following cases depending on the form of u. 

• If rt is of the form A(a) or R{a, b) with a,b G a, then OU{u} ^ P(^ since, due to PropositionfTsI Ecr{0) is a a-rewriting 
of O', thus, O U {Q} \= P{s) which yields a contradiction with (Hi). 

• If u is of the form 3y.R{a, y) with a G a, then let Omin be a minimal subset of 'E.a{0) such that Omin U {u} |= P(s). 

Due to the assumption, it holds O U {u} ^ thus, Omin % O and therefore Omin includes one of the rules 

introduced by S. That is, Omin contains (some of) the following rules that come from the Skolemisation ^^(r) of some 
rule r = A(^x) -G 3?/.[S'(a;, y) A B{y)] of Type (3) in O: 


A{x) ^ Ps{x,CA,s), Ps{x,y) ^ S{x,y), and Ps{x,y) ^ B{y). (13) 

Consider a proof tt = Gq G„ of P{s) in Ec{0) U [3y.R{a, y)], where Gg = P{^. Clearly, Gi can be obtained 

from Gi-i by applying a rule from Gmin for each i = 1,..., n — 1, and G„_i = R{a, x') for some x' since the last step 
of the proof is applying the only rule from [3y.R{a, y)]. Let Gk be the first goal in tt obtained from Gk-i by applying a 
rule from Equation ( |T3] i; clearly, O U {3*Gfe_i} \= 3*Go. We have the following cases. 

- Assume that we apply the third rule from Equality ([T^ to Gk-i = B{b) for some constant b (note that a goal B{x) 
with X a Skolem constant cannot appear by applying QL rules except for Type (3)). Then Gk = Ps{x, b), and the 
only rule that has Ps in its head is the first one from Equality ( [T3] i; however, this rule cannot be applied to Gk since 
we cannot unify b and ca,s- Thus, this case is invalid. 

- Assume that we apply the second rule from Equality (ED to Gk -1 = S{b, d) for some constants b and d. This case is 
always invalid due to the same reason as the previous one. 

- Assume that we apply the third rule from Equality ( [T3] l to Gk-i = S{b, x) for some constant b and Skolem constant 
X. Then, Gk = Ps{b, x) and Gk+i is obtained from Gk by applying the first rule from Equation (T^; that is, Gfc+i = 
A{b). But then we have that A(x) -G 3y.[S{x, y) A B{y)] G O and consequently O U {A( 6 )} = 3*Gfc_i. W.l.o.g. 
we can assume that starting from Gk+i rules only from O are used, which means that O U [3?/.ii(s, y)] ^ A{b). 

- No other case is possible. 

Thus O U {«} \= P{s) which contradicts (Hi). 

Thus, cens is optimal for I, which concludes the proof. □ 


B Appendix (Algorithms) 


Algorithm 1: Compute an optimal view for a guarded tree-shaped CQE instance 

INPUT : a guarded CQE-instance I = (G, V, P) 

OUTPUT: a dataset V 

iOe:=OU\J 

binary R in o{i?(x,j/) Sr{x), R{x,y) -G PR{y)}', 

2 H := the minimal Herbrand model for Or and D; 

3 V := a maximal subset of unary atoms from H s.t. cert(P, Or, V) = 0; 

4 for each constant a from R. do V := AddUnPredicates(a) 

5 for each R{a, b) G PL such that R is not « do V := AddBinPredicates(i?(a, b)) 

6 return V; 







Algorithm 2: Sub-routines for Algorithm[T] 
Sub-routine AddUnPredicates 


INPUT : a constant a 
OUTPUT: a dataset V' 

1 V := V; 

2 C:= {A \ A{a) S H}; 

3 (Ta := {«}; 

4 for each subset Sub of C closed under do 

5 create a globally fresh copy asub of a; 

6 if U V' U {A{asub) I A G Sub} P{s) for each s G cert(P, O, V) then 

7 V := V' U {A(asub) I A G Sub}; 

8 fJa ■■= CTa U {aSub}', 

9 return V'; 


Sub-routine AddBinPredicates 


INPUT : a binary atom i?(a, b) 

OUTPUT: a dataset V 

1 V := V; 

2 for each pair a* G Ua tmd b* G cr^ do 

3 I if CheckRole(i?(a*, &*), V') then V':= V'U 6*)} 

4 return V'; 


Sub-routine CheckRole 

INPUT : a binary atom i?(a, b), a dataset V' 

OUTPUT: True or False 

1 if Oe U V' U {-R(a, b)} ^ P{s} for each s G cert(P, 0,V) 

2 and Oe U V' U {P(o, b)} ^ C{c) implies C{c) G V for any unary predicate C then 

3 I return True; 

4 else return False 












C Appendix (Reduction) 

In this section, we show the reduction of the problem of uniform boundedness for binary Catalog to the problem of existence 
of optimal obstructions for Catalog CQE instances (see Section]^. 

Let O be a binary Catalog ontology over a signature E (observe that w.l.o.g. we can assume that O is connected). Then, O 
is uniformly bounded if there is a constant N such that for every dataset V over S and for every ground atom P(f), if the atom 
has a proof from O and T), then it has a proof not longer than N. It is well known that each relation P{x) defined by O is 
equivalent to an infinite union of CQs V^i V’f (^)- Note that each (pf (x) is a result of applying some sequence of rules from 
O to P(x). Moreover, 

(PI) a Catalog ontology is uniformly bounded if and only if there exists a number N such that each P(x) is equivalent to 

Now we are ready to provide the required reduction. Let O be a binary Catalog ontology. We are going to construct a CQL 
instance I = {O' ,'D, P) which admits an optimal obstruction if and only if O is uniformly bounded. The ontology O' of I is 
defined as 

O U (a, x) A A(x) —)• P I A is unary and A G E} 

U {i?f (a, xi) A (a, X 2 ) A 5'(xi, X 2 ) —>■ P | S' is binary and S G E}, 

where all Rf and Pf and P are fresh predicates. The dataset V is equal to 

{A(a), S{a, a) | A is a unary and S is a binary predicates from E'}U{P}, 

where E' is E extended with fresh predicates Pf. Observe that this dataset admits any possible proof of P. 

It is easy to see that Q \ S contains the queries ipf and tpf of the form Bx.R'^i^a, x) A 


Let Q \ S are built as in Cefinition 
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p>i{x) and 3xi3x2-Ri{a^ Xi) A Pf (a, X 2 ) A ipf {xi,X 2 ), respectively, for each A, S G E as each of them with the help of O' 
compromises the policy. 

Assume that O is not uniformly bounded; then, due to Property (PI), there is some Q G E such that for any number N 
we have that Vili ^ V^i That is, it is not the case that for each ipf there exists with j < N such that 


there is a homomorphism from (x) to (x) (note that here distinguished variables are mapped into themselves). This 


immediately yields that it is not the case that for each number N and for each there exists with j < N such that 




there is a homomorphism from to tpf (note that, although here we do not have distinguished variables, we still have that 


/,Q 
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no optimal 


the variables of tp^ that correspond to distinguished variables of (p^ (x) are mapped to the variables of tpf that correspond to 
distinguished variables of pf {x) since they are “marked” by predicates Pf). Moreover, for every predicate T different from 
Q, it holds that for any i and any j there is no homomorphism from tpf to tpj since the former one mentions the predicate 

pf and the latter one Pf . Hence, there is no finite pseudo-obstruction for I and therefore, due to Theorem 
obstruction censor for I exists. 

Assume that O is uniformly bounded and A^ is a number such that for any dataset, if a fact can proved from O and the dataset, 
then there is a proof of this fact not longer than N. Let T be a subset of Q \ S consisting of those Boolean CQs 3*G, where 
G is a sub-goal in some proof of P in G' U P of length not longer than N + 3. We claim that the UCQ U = V<^gt '■P 
optimal obstruction for I. Assume that there exists a Boolean CQ tp = 3* Go from Q \ S with Go a sub-goal coming from some 
proof of length greater than N + 3. This means that G' U S U {tp} ^ P. Than there exists a proof tt of P from O' U A, where 
A= [tp]Li UvjgsM’ of length no longer than W -f 3 (1 step to apply one of the rules Pf (a, Xi) A Pf X 2 ) A S{xi,X 2 ) -A P 
from O', N steps to proof S{xi,X 2 ) using rules from G U A, and 2 additional steps to proof Pf (a, bf A Pf (a, ^ 2 ) using facts 
from A for some elements 61 and & 2 )- W.l.o.g., we can assume that this proof is normalised. Recall that all the rules that are 
applied after the frontier are from A. We can assume w.l.o.g. that rules from \tp] are applied only at the very end of the proof. 
Clearly, the goal G right before we start to apply the rules from [tp] is such that (i) 3*G G T and (ii) there is a homomorphism 
from 3*G to tp. These properties imply that T is a pseudo-obstruction and, since it is finite, by Theorem 22 we have that an 
optimal obstruction censor for I exists. 





